7 research outputs found
Towards High-Frequency Tracking and Fast Edge-Aware Optimization
This dissertation advances the state of the art for AR/VR tracking systems by
increasing the tracking frequency by orders of magnitude and proposes an
efficient algorithm for the problem of edge-aware optimization.
AR/VR is a natural way of interacting with computers, where the physical and
digital worlds coexist. We are on the cusp of a radical change in how humans
perform and interact with computing. Humans are sensitive to small
misalignments between the real and the virtual world, and tracking at
kilo-Hertz frequencies becomes essential. Current vision-based systems fall
short, as their tracking frequency is implicitly limited by the frame-rate of
the camera. This thesis presents a prototype system which can track at orders
of magnitude higher than the state-of-the-art methods using multiple commodity
cameras. The proposed system exploits characteristics of the camera
traditionally considered as flaws, namely rolling shutter and radial
distortion. The experimental evaluation shows the effectiveness of the method
for various degrees of motion.
Furthermore, edge-aware optimization is an indispensable tool in the computer
vision arsenal for accurate filtering of depth-data and image-based rendering,
which is increasingly being used for content creation and geometry processing
for AR/VR. As applications increasingly demand higher resolution and speed,
there exists a need to develop methods that scale accordingly. This
dissertation proposes such an edge-aware optimization framework which is
efficient, accurate, and algorithmically scales well, all of which are much
desirable traits not found jointly in the state of the art. The experiments
show the effectiveness of the framework in a multitude of computer vision tasks
such as computational photography and stereo.Comment: PhD thesi
A Practical Stereo Depth System for Smart Glasses
We present the design of a productionized end-to-end stereo depth sensing
system that does pre-processing, online stereo rectification, and stereo depth
estimation with a fallback to monocular depth estimation when rectification is
unreliable. The output of our depth sensing system is then used in a novel view
generation pipeline to create 3D computational photography effects using
point-of-view images captured by smart glasses. All these steps are executed
on-device on the stringent compute budget of a mobile phone, and because we
expect the users can use a wide range of smartphones, our design needs to be
general and cannot be dependent on a particular hardware or ML accelerator such
as a smartphone GPU. Although each of these steps is well studied, a
description of a practical system is still lacking. For such a system, all
these steps need to work in tandem with one another and fallback gracefully on
failures within the system or less than ideal input data. We show how we handle
unforeseen changes to calibration, e.g., due to heat, robustly support depth
estimation in the wild, and still abide by the memory and latency constraints
required for a smooth user experience. We show that our trained models are
fast, and run in less than 1s on a six-year-old Samsung Galaxy S8 phone's CPU.
Our models generalize well to unseen data and achieve good results on
Middlebury and in-the-wild images captured from the smart glasses.Comment: Accepted at CVPR202
Towards High-Frequency Tracking and Fast Edge-Aware Optimization
Computer vision has seen tremendous success in refashioning cameras from mere recording equipment to devices which can measure, understand, and sense the surroundings. Efficient algorithms in computer vision have now become essential for processing the vast amounts of image data generated by a multitude of devices as well as enabling real-time applications like augmented reality/virtual reality (AR/VR). This dissertation advances the state of the art for AR/VR tracking systems by increasing the tracking frequency by orders of magnitude and proposes an efficient algorithm for the problem of edge-aware optimization. AR/VR is a natural way of interacting with computers, where the physical and digital worlds coexist. We are on the cusp of a radical change in how humans perform and interact with computing. This has been led by major technological advancements in hardware and in the tracking, rendering, and display algorithms necessary to enable AR/VR. Humans are sensitive to small misalignments between the real and the virtual world, and tracking at kilo-Hertz frequencies becomes essential. Current vision-based systems fall short, as their tracking frequency is implicitly limited by the frame-rate of the camera. This thesis presents a prototype system which can track at orders of magnitude higher than the state-of-the-art methods using multiple commodity cameras. The proposed system exploits characteristics of the camera traditionally considered as flaws, namely rolling shutter and radial distortion. The experimental evaluation shows the effectiveness of the method for various degrees of motion. Furthermore, edge-aware optimization is an indispensable tool in the computer vision arsenal for accurate filtering of depth-data and image based rendering, which is increasingly being used for content creation and geometry processing for AR/VR. As applications increasingly demand higher resolution and speed, there exists a need to develop methods that scale accordingly. This dissertation proposes such an edge-aware optimization framework which is efficient, accurate, and algorithmically scales well, all of which are much desirable traits not found jointly in the state of the art. The experiments show the effectiveness of the framework in a multitude of computer vision tasks such as computational photography and stereo.Doctor of Philosoph